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^ Abstract 

Data replication technologies enable efficient and highly- available data access, thus gaining 
^ more and more interests in both the academia and the industry. However, data replication 

O introduces the problem of data consistency. Strong consistency conditions, such as linearizability 

and sequential consistency, are expensive to guarantee. Hence researchers have proposed weak 
I consistency conditions, to enable tradeoffs between data access cost and data consistency. In 

K*" this work, we focus on PRAM consistency. PRAM consistency is weak in that it does not 

require all the processes to agree on the same view of the order in which the data operations 
occur. To determine whether data replication systems indeed provide PRAM consistency, we 
work on the problem of verifying PRAM consistency over Read/Write traces. We first propose 
the RW-Closure algorithm, which models replica Read/Write operations as graph nodes, and 
ordering relations between operations as directed edges. PRAM consistency conditions can 
then be captured by three rules of iteratively adding edges to the transitive closure of the graph. 
T— I The worst-case complexity of the RW-Closure algorithm is O(n^), where n is the total number 

sjjl of operations in the Read/Write trace. We then propose the Read-Centric algorithm, which 

• ^ reduces the complexity to 0{n'^). It restricts the applications of the three rules in a Read- 

induced subgraph and manages them in topological order, utilizing the fact that all Reads are 
^ on the one process as required in PRAM consistency. The experimental evaluation shows the 

efficiency and the scalability of the two proposed algorithms. 
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1 Introduction 

Data replication consists of maintaining multiple copies of critical data, called replicas, on separate 
computing entities. It is a critical enabling technology in distributed systems, improving both sys- 
tem reliability and performance [HI |9l [251 [2] • System reliability is improved by allowing access to 
the data even when some of the replicas are unavailable. Performance improvements concern re- 
duced latency, by letting users access nearby replicas, and increased throughput, by letting multiple 
entities serve the data. 

'Corresponding author. 
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For a data replication system to work in practice, it should have three desirable features, 
namely consistency (C), availability (A), and partition tolerance (P) [6]. Specifically, consistency 
is equivalent to having a single (emulated) up-to-date copy of the data. Availability informally 
refers to the property that each request eventually receives a response. Meanwhile, a system can 
be unreliable in many ways, e.g. crash failure, message loss, broken communication and malicious 
attacks. Data replicas can then be partitioned into several isolated groups. It is desiable for the 
system to tolerate such partitions [6l U\ E! ■ 

Unfortunately, a data replication system cannot have all the three desirable features simultane- 
ously [6l[13]. This impossibility result, known as the CAP theorem, leads to multiple balance op- 
tions among consistency, availability, and partition tolerance. For modern data replication systems, 
commonly distributed over a wide area network, partition tolerance is a must [3 [H]. Moreover, 
both the data storage service providers and their customers prefer high data availability, for the 
commerical reasons and an "always-on" user experience |1H [9l [25]. Therefore, the reality forces us 
to trade consistency for availability and partition tolerance in many data replication systems. 

Researchers have developed many levels of consistency conditions to meet various requirements 
of data replication systems. Strong consistency, such as linearizability [18] or sequential consis- 
tency [5 imposes very strict requirements on Read/Write operations. Data replication systems 
with strong consistency may require a high-cost implementation and result in unacceptable ac- 
cess latency. Weak consistency conditions ^2] |2l [JH [25] provide alternatives to relax the strict 
requirements and allow more efficient implementations. For example, Yahool's PNUTS [9] chooses 
to provide the per-record timeline consistency. Amazon's Dynamo only promises eventual con- 
sistency. Nowadays, the weak consistency is playing a more and more important role, with the 
prevalence of cloud data storage services, mobile devices, and wireless communications. 

In this work, we focus on PRAM consistency [22], one of the well-known weak consistency 
conditions. Informally, a Read/Write trace is conformed with PRAM consistency if and only if 
Write operations performed by a single process are observed by all the other processes in program 
order. PRAM consistency is weak in the sense that it does not require all the processes to agree 
on the same view of the order in which operations occur [3]. To illustrate its practical usefulness, 
let us consider the photo sharing application described in p]. In this application, users are allowed 
to post photos with their own preferences of photo access control. Now image that Alice wishes 
to share some photos with her close friends but not with her mother. She does a sequence of two 
updates to her record: adds her friends to and removes her mother from the photo access list, and 
then posts the photos. Under PRAM consistency condition, the two updates are guaranteed to be 
performed by any other processes in the order they are submitted. 

Different protocols can be designed to guarantee PRAM consistency. However, it is notori- 
ously difficult to prove the correctness of such protocols. One important complementary approach 
is to verify whether the Read/Write traces of data replicas satisfy PRAM consistency. Though 
PRAM consistency is regarded useful, to the best of our knowledge, its verification problem has 
not been addressed yet. In this work, we address the problem of verifying PRAM consistency over 
Read/Write traces. Specifically, 

• We first propose the RW-Closure algorithm, which models replica Read/Write operations as 
graph nodes, and ordering relations between operations as directed edges. PRAM consistency 
can then be captured by three rules of iteratively adding edges to the transitive closure of 
the graph. Its worst-case complexity is in O(n^), where n is the number of operations in the 
trace. 

• We then propose the Read-Centric algorithm with time complexity O(n^). It restricts the 
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applications of the three rules in a -Read-induced subgraph and manages them in topological 
order, utilizing the fact that all Reads are on the one process as required in PRAM consistency. 

• Experiments are conducted to show the efficiency and scalability of the proposed two algo- 
rithms. 

The rest of the paper is organized as follows. Section [2] discusses the related work. Section [3] 
defines the problem of verifying PRAM consistency over Read/Write traces of data replicas. Section 
|4] and Section [5] propose the RW-Closure algorithm and the Read-Centric algorithm respectively. 
Section [6] presents the performance evaluation. SectiorjT] concludes the paper with a brief summary 
and future work. 

2 Related Work 

Data replication technique poses a great challenge for maintaining consistency among different 
replicas. Many different levels of consistency conditions have been proposed, ranging from strong 
consistency such as linearizability [18] and sequential consistency [H [21] to the weaker ones such 
as PRAM consistency p2|, causal consistency [2J, and eventual consistency [25]. Much effort has 
been made to discuss the relationship among different consistency conditions [Ml IS] . Specifically, 
Steinke and Nutt [Mj first identify a set of four consistency properties, and then reinterpret the 
existing consistency conditions as a combination of any subset of them. In this way, they establishes 
a lattice structure of consistency conditions. Haldar and Vidyasankar [16j identify five different 
types of "recentness" of the value a Read could return and use them to define five Read illegalities. 
The consistency conditions are then reinterpreted in terms of the presence or absence of these Read 
illegalities. 

Another related work is the verification problem with respect to other consistency conditions. 
Given a trace of data replicas (or, in other words, an execution of shared memory systems) and 
a specific consistency condition, the verification problem is to check whether the trace (or, execu- 
tion) satisfies the consistency condition. In their seminal work, Gibbons and Korach [12] defines 
the verifying sequential consistency (VSC) and the verifying linearizability (VL) problems. Both 
problems are proved to be NP-Complete in general. They also systematically study many vari- 
ants of the two problems and provide a collection of complexity results. For example, the VSC 
problem is NP-Complete even when the number of shared variables is bounded. Cantin et al. [S] 
focus on the verifying memory coherence problem (VMC). Similarly, the problem is proved to be 
NP-Complete and is also analyzed under some important restrictions, for example, on the number 
of operations per process or on the number of Writes of each value. Golab et al. |15j address the 
verification problems with respect to safety, atomicity, regularity, and sequentical consistency. More 
importantly, they provide other aspects for such verification problem. First, they consider how to 
detect a consistency violation as soon as one happens by means of an online algorithm. Second, 
they further consider how to quantify the severity of the violations by means of the staleness of the 
Reads and the commonality of violations. Our work in this paper will investigate the verification 
problem with respect to PRAM consistency (VPC), one of the weak consistency conditions. Though 
PRAM consistency is regarded useful, to the best of our knowledge, its verification problem has 
not been addressed yet. 

In the context of shared memory multiprocessor, some relaxed memory consistency conditions 
have been studied in [TTl E]- Specifically, Hangal et al. [I7] develop TSOtool to analyze the 
traces of programs against TSO (Total Store Order) model (VTSO). The time complexity of their 
algorithm is in O(n^), where n is the number of instructions in the program. Roy et al. [23J deal 
with the VTSO problem and present a fully parallelized algorithm with O(n^) time complexity. 
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Baswana et al. [5] identify a graph problem called implied-set-closure (ISC) as the abstraction 
of the bottleneck of the VTSO problem, and reduce the time complexity to O(n^). However, all 
the above algorithms only do approximate checking because the problem itself is NP-Complete. 
In contrast, our work in this paper focuses on PRAM consistency. PRAM consistency is one of 
the well-known weak consistency conditions in the sense that it does not require all the processes 
(processors) to agree on the same view of the order in which operations (instructions) occur 
Though inspired by [23], we prove that the VPC problem with multiple shared variables and unique 
Write values can be solved in polynomial time by proposing two polynomial algorithms. 

3 Problem Definition 

In this section, we first define the Read/ Write trace of data replicas and PRAM consistency, and 
then define the problem of verifying PRAM consistency over the Read/Write trace. 

3.1 Read/ Write Trace 

We model the data replication system as a collection of processes, and the data replicas as a 
collection of Read/ Write shared variables. 

Definition 3.1 (Operation (o)) 

An operation o is a tuple {t,p, v, d) where, 

• t G {R,W} is the type of operation {R for Read and W for Write). Each operation is 
completed in the sense that a Read has returned its value or a Write has been applied and 
acknowledged; 

• p £ P is the id of process submitting the operation; 

• V £ V is the variable to which the operation is applied; and 

• d is the value involved in the operation on variable v. 

We assume that each process has at most one operation pending at a time. Furthermore, each 
operation is complete instantaneously. 

We adopt the following notational conventions for operation o = {t,p,v,d). The process is 
denoted by p{o). The variable and the value involved are denoted by var{o) and val{6) respectively. 
Unless stated otherwise, we use o for any operation, r for any Read, w for any Write, O for the set 
of all operations, R for the set of all Reads, and W for the set of all Writes. 

There are two basic partial orders between these operations. Program order, -<po, is the order 
in which operations are conducted by each process. 

Definition 3.2 (Program Order (^po)) 

(01,02) G^po if and only if p{oi) = ^(02) and oi is completed before 02. We employ :<po to denote 
the reflexive counterpart to -<po- 

Write-to order, -<WR defines which Write is read by each Read. 
Definition 3.3 (Write-to Order (^wr)) 

(01,02) £^WR if and only if oi G I^ A 02 G i?, and var{oi) = var[o2) A val{oi) = 770/(02). 
We now define the Read/Write traces of data replicas as follows. 
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Definition 3.4 (Read/ Write Trace (t)) 

A Read/ Write trace t of data replicas is comprised of multiple process histories, each of which, in 
its own right, consisting of a finite sequence of Read and Write operations in program order. 

In this work, we are concerned with some schedule (denoted vr) of a collection of operations O 
which is a linear extension (denoted ttq) of all the operations with some predefined partial orders. 
Given a schedule of operations, the preceding relation between any two operations is denoted by 



3.2 PRAM Consistency Condition 

PRAM consistency condition is one of the well-known weak consistency conditions \22\ [23] . It takes 
into account both program order and write-to order. Informally, a Read/ Write trace is conformed 
with PRAM consistency if and only if Write operations performed by a single process are observed 
by all other processes in program order, whereas Write operations from different processes may be 
observed in different orders by different processes [24]. There are two key points to explain. On the 
one hand, PRAM consistency is weak in that it does not require all the processes to agree on the 
same view of the order in which operations occur [3] . It implies that each process can be checked 
against PRAM consistency separately. On the other hand, the operations visible to each process 
include only its own Read and all Write operations, while ignoring Read operations from other 
processes. It establishes an important fact that for process p, all the visible Reads are on the same 
process (i.e., p itself). It is worth noting that the fact is crucial for the design of the verification 
algorithms in Section |4] and Section [5| In the following, the set of visible operations for process p 
is denoted hy Op = {o\ {p{o) = p A o £ R) V {o e W)} . 

With the explanations above, we can now rephrase PRAM consistency condition as follows: If, 
for any process p and its visible operations Op, there exists a legal schedule of Op, respecting both 
program order and write-to order, then the trace is regarded as PRAM consistent. 

Here, the legal schedule requires that each Read reads the value from the latest preceding 
Write in the schedule. This is actually the fundamental correctness requirement for all consistency 
conditions |24^ [3] and deserves a formal definition. 

Definition 3.5 (Legal Schedule (LS)) 

A schedule of operations is legal if and only if each Read reads the value from the latest preceding 
Write in the schedule. 

The predicate LS{'k) is evaluated true if and only if the schedule vr is legal. 

On the other hand, the notion of respect is formalized as follows. 
Definition 3.6 (Respect Property) 

A schedule vr of a set of operations O is said to respect some partial order R (denoted vr-^) if and 
only if the schedule is a linearization of the partial order. Formally, 



To sum up, the formal notion of PRAM consistency condition is as follows. 
Definition 3.7 (PRAM Consistency Condition) 

A Read/ Write trace t satisfies PRAM consistency condition if and only if for any process p in the 
trace and its visible operations Op, there exists a legal schedule of Op, respecting both program 
order (^po) and write-to order [^wr)- Formally, 




.-<PO^-<WR 
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A^ultiple variables 


cSingle variable 


Write ^Duplicate values 


VPC-MD (?) 


VPC-SD (?) 


Write Z^nique values 


VPC-MU (P) 


VPC-SU (P) tl5j 


Table 1: The variants and complexity issues o 


' the VPC problem. 



According to Definition 3.7, we can check each process against PRAM consistency separately. 
In the remainder of this paper, we thus focus on the verification problem with respect to some 
particular process and distinguish the process with pQ. 

3.3 The Problem of Verifying PRAM Consistency 

In this section, we define the problem of verifying PRAM consistency over Read/Write traces as a 
decision problem. 

Definition 3.8 (Verifying PRAM Consistency (VPC)) 

• Instance: A Read/Write trace t of data replicas. The size of t is defined as the total number 
of the operations in it and denoted by n. 

• Question: Does t satisfy PRAM consistency? 



Following the terminology in [12], we identify four variants of the general VPC problem from 
two orthogonal dimensions: a) whether there are multiple shared variables, and h) whether the 
Write operations are uniquely valued for each shared variable. 

As stated in Table [H the VPC-SU proble m can be addressed in polynomial time, trivially 
following from [15]. In this paper, we focus on the VPC-MU problem: verifying PRAM consistency 
over Read/Write traces under the assumption that there are multiple shared variables and the Write 
operations are uniquely valued for each shared variable. In practice, the Writes can be tagged with a 
globally unique identifier, e.g., by combining its process id and its local index on process. Therefore, 
the latter assumption does not incur any loss of generality. Note that in the VPC-MU problem, for 
each Read operation r, there is at most one Write (denoted D{r) for dictating Write) from which 
r reads the value. 

To the best of our knowledge, the VPC-MU problem has not been addressed yet. In this paper, 
we demonstrate that it can be addressed in polynomial time by proposing two verification algo- 
rithms: the Read-Closure algorithm with O(n^) time complexity and the Read-Centric algorithm 
with 0{n^) time complexity. 

4 The RW-Closure Algorithm 

In this section, we present the basic idea of the RW-Closure algorithm, its detailed design, an 
illustrating example, and more importantly its correctness proof. 

4.1 Overview 

The RW-Closure algorithm models the Read/ Write trace as a directed graph with operations as 
nodes and ordering relations between operations as edges. PRAM consistency is captured by three 
kinds of edges. The RW-Closure algorithm keeps adding such edges until no more edges can be 
added. The trace t is regarded as PRAM consistent if and only if the resulting graph is acyclic 
(i.e., a DAG). 

Specifically, at least two kinds of edges are necessary to meet the PRAM consistency: edges for 
program order and edges for write-to order. The third kind of edges can be derived from the legal 



schedule notion in Definition 3.5 [24^ I23j. In a legal schedule, between each Read operation r on 



the variable v and its dictating Write w = D{r), there cannot be any other Write (denoted w') on 
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Figure 1: Illustration of the RW-Closure algorithm. 



the same variable v. This observation results in two cases: 1) if -< r in advance, we have w' -< w; 
and 2) if w ^ w' , we have r -< w'. 

To sum up, we get the following four rules for adding edges in Q. 

1. (Rule (a): program order) For any pair of operations oi and 02, if oi -<po 02, then add 
an edge from oi to 02. 

2. (Rule (b): write-to order) For any pair of operations w and r, if w -<WR then add an 
edge from w to r. 

3. (Rule (c): w'wr order) For any triple of operations w, r and w' , if {w -<WR f) A {yar{w') = 
var{w)^ A {w' ~< r), then add an edge from w' to w, leading to w' -<w'W ^ ^WR "f"- Note that 
we denote the ordering relation between such w' and w by ^w'W- 

4. (Rule (d): wrw' order) For any triple of operations w, r and w', if {w <WR A {yar{w') = 
var{w)) A (tf -< w'), then add an edge from r to w' , leading to w ^WR f -^RW w'. Note that 
we denote the ordering relation between such r and w' by ^rw'- 



Note that we will show that the first three rules are sufficient for the VPC-MU problem in the 
following sections. 

4.2 Detailed Design 

As shown in Algorithm [T| program order and write-to order together lay a foundation for further 
edges to be added (Lines [T] - [6]) . To apply Rule (c) , it is expected to first identify the triples 
conformed with it. To this end, the algorithm checks each Read (r) and its unique dictating Write 
(w) to find out all potential w' such that there is a path from w' to r (i.e., w' -< r) (Lines [9] ■ 



14). The reachability relation between w' and r can be computed in advance by transitive closure 
algorithms (Line [s]) based on the n xn Boolean operation matrix (opMatrix) . If any edges are 
added by Rule (c), new triples conformed with Rule (c) can emerge due to updated reachability 
relation. Therefore, the algorithm keeps applying Rule (c) and computing the transitive closure. 



until no more edges are added (Lines 15 - 17). Finally, it is claimed that the trace satisfies PRAM 
consistency if and only if the resulting graph Q is acyclic (Line 20). 

4.3 An Illustrating Example 

Figure [T] shows a running example for the RW-Closure algorithm. The edges for program order 
and write-to order are denoted by solid lines. The edges added by Rule (c) are denoted by dashed 
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Algorithm 1: The RW-Closure algorithm. 



Data: A Read/Write trace t. 

Result: True, if t satisfies PRAM consistency; False, otherwise. 



1 Step 1: 



10 foreach Read operation r do 



2 Apply Rule (a) to add "program order" edges; ii 

12 
13 



3 Step 2: 

4 Apply Rule (b) to add "write-to order" edges; 

5 if some r has no dictating D{r) then 

6 1^ return false; 



14 



7 Step 3: 

8 Compute the transitive closure of Q; 

9 Apply Rule (c) to add ^^w'wr order" edges: 



w ^ D{r); 

foreach w' s.t. opMatrix[w'][r\ = 1 do 
if var{w') = var{w) Aw'/ 
w A opMatrix[w'][w\ = then 
\_ opMatrix[t(;'][tt;] ^ 1 ; 



15 Step 4: 

16 if any edges are added by Rule (c) then 

17 \_ Goto Step 3; 

18 Step 5: 

19 if ^ is a DAG then return true; 

20 else return false; 



lines, with labels indicating the order they are added. For example, after the application of Rule 
(c) to Wy2, Wyl, and Ryl (with label 4), the ensuing transitive closure algorithm identifies a path 
from Wf2 to Rfl (via edges with label 3 and label 4), and in turn leads to another application of 
Rule (c) to Wf2, Wf I, and Rfl (with label 5). 

We can figure out a legal schedule of all the operations as a witness to PRAM consistency 
(Equation. 4.1 ). The specific schedule will be justified by the correctness proof of the Theorem 4.1 
in Section 14. 4[ Note that the Reads are stressed in bold. 



Wf2, Wfl, Wz2, Wzl, Wy2, Wyl, Rfl; 

Wx5, Wx3, Wx2, Wcl, Rcl; Rzl; Ryl; (4.1) 
Wal, Ral; Wbl, Rbl; Rx2. 

4.4 Correctness Proof 

If the resulting graph Q from Algorithm [T] is a DAG, we expect to construct some legal schedule as 
a witness to PRAM consistency. To this end, we perform a specific topological sorting on Q and 
prove that the resulting schedule (denoted irg) is legal. The topological sorting is conducted in line 
with the Reads on po and is based on the following two definitions. 

Definition 4.1 (r-downset (r^)) 

r-downset of a Read operation r is a set rjj, of operations with the property that, 
• o G r^j, A o' -< o =^ o' e r^. 

Intuitively, r-downset consists of all the operations which must be scheduled before r, including 
r itself. 

In the following, we assume that r' and r are two consecutive Read operations on pQ with 
r' -<po ^- The r-delta concerns about the newly updated part of operations relative to r-downset. 
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Definition 4.2 (r-delta (rs)) 

r-delta of a Read operation r is a set rs of operations wiiich equals the relative complement of 
with repect to r^j. (i.e., rjj, \ r'^). For the first Read operation r, we have rs = rjj,. 

In terms of and rs, the following definition describes the construction of ttq. 

Definition 4.3 (DAG-schedule (vrg)) 

Given the DAG Q, the schedule -Kg is constructed as follows: 

• initialize vrg to be the empty sequence; 

• for each Read operation r of process po in program order, perform any topological sorting on 
rs and append the linear ordering of operations in rs to ttq. 



The example in Section [4.3| (Equation 4.1) shows an illustration of the DAG-schedule. 



Theorem 4.1 (Correctness of the RW-Closure Algorithm) 

The instance of VP C-MU problem satisfies PRAM consistency if and only if the RW-Closure algo- 
rithm terminates with a DAG Q. □ 

Proof There are two directions to prove: 

1. ("=^>") Prove by contradiction. If the resulting graph Q is not a DAG, there exists some 
operation committing before itself. Contradiction; and 

2. ("<^=") If the resulting graph Q is acyclic, we prove that the schedule vrg constructed in 



Definition 4.3 is legal. This is shown as Lemma 4.1 



Lemma 4.1 (DAG-schedule is Lej;fl) 

If the resulting graph Q of Algorithm 1 is acyclic, then the schedule vrg constructed in Definition 

Proof We prove this lemma by induction on the Reads of process po in program order. 

(I. Base Case) For the first Read operation r = Rxd, it is easy to check that (Figure [2] (1)): 

• It is not the case that Rxd -<po Wxd; and 

• w = Wxd is not overwritten by any other Write operations. Formally, 

V^' (^var{w') = xAw'^w^w'^w -<WR f) ■ 

Thus, we can perform any topological sorting conforming to DAG to get a legal schedule for 
the base case in the sense that Rxd reads the value from Wxd. 

(II. Induction Hypothesis) Assuming that all of the first (n— 1) Read operations have been scheduled 
successfully, it remains to prove that the v}^ Read operation (denoted r = Rxd) will be also 
scheduled successfully according to the DAG-schedule in Definition |4.3[ 

(III. Induction Step) Distinguish between two cases according to whether the dictating Write {D{r)) 
has been scheduled before. Note that we mark the previous Read opeartion as r' . 

1) {D[r) ^ r||^) Note that there is only one Read operation (i.e., Rxd) in rs. On the other 
hand, the Write operations in rs on other variables than x have no effect on the topological sorting 
which has been established until r' . The argument is therefore reduced to the base case and we 
can directly perform any topological sorting on rs and append it to the existing schedule. 
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2) {D(r) G r^) There are still two subcases to consider. In either case, it remains to show that 
r can be scheduled legally (Figure [2] (2a) and (2b)). 

2.1) ( There exists another Rxd preceding r on process pq ) Take the earliest Rxd and denote 
it by f (Figure [2] (2a)). We demonstrate that r (the n*^ Read) can be scheduled legally. In the 
following, d, d' , and d" are three different values. 

• If Wxd' E fjj., then we have Wxd' -Kw'W Wxd -^WR <po f induction hypothesis (Wxd' 
with label 1). 

• If Wxd' ^ r^, then we have Wxd' ^ r^. Otherwise, Wxd' ^ r =^ Wxd' <w'W Wxd ^• 
Contradiction {Wxd! with label 2 and label 3). 

• If Wxd' ^ fjj,, then Wxd' has no effect on the schedule of Rxd up to now. 

In conclusion, we can perform any topological sorting on rg (actually, in this case, there is 
only one topological sorting on because all operations are on pq) and append it to the existing 
schedule. 

2.2) (There is no other Rxd preceding the n*'^ one on process po) In this case, we can conclude 
that there is no Rxd' such that Rxd' -< po Rxd A Wxd G Rxd'^^ either (Figure |2|(2b)). Otherwise, 
there exists a cycle involving Wxd and Wxd' (Wxd with label 1) following from: a) Wxd' -<WR 
Rxd' -<po Rxd =^ Wxd' -<w'W Wxd; and h) Wxd ■< Rxd' =^ Wxd -<w'W Wxd'. Therefore, Wxd 
will not be overwritten by the similar argument in Case 2.1) (Wxd with label 2 and Wxd" with 
label 3). 

Therefore, we can perform any topological sorting on rs and append it to the existing schedule. ■ 



As mentioned in Section 4.1, Rule (d) (i.e., wrw' order) is not necessary for the VPC-MU 
problem. We now explain this point more directly. In Figure [2] (3), R, W, W' are all on the same 
variable. R reads the value from W and W precedes W' . It is expected to show that the application 
of Rule (d) to W, R, and W' is dispensable in the sense of cycle detection. First, it is necessary 
for W' ^ R (edge with label 2) to close the cycle involving R — ^ W'. Consequently, W' <w'W W 
(edge with label 3), due to Rule (c), closes another cycle together with W -< W' (edge with label 

!)• 

4.5 Time and Space Complexity 

The worst-case time complexity of the RW-Closure algorithm is dominated by the cost for Step 3 
(Lines [7] - [14] in Algorithm [l| . The transitive closure of Q (Linejsj) can be computed in O(n^) time 
by the Floy d- War shall' s algorithm |1U] . Rule (c) calls for the exploration of relation like w' ^ r 
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(Line 12). It costs 0{in?) to examine all the potential pairs of nodes. The iteration over Step 3 and 
Step 4 may loop at most O(n^) times, adding one edge by Rule (c) in each iteration. In total, the 
worst-case time complexity of the RW-Closure algorithm is in O(n^). 

Clearly, the space complexity of the RW-Closure algorithm is in G(n^), resulting from the 
Boolean operation matrix (opMatrix). 

5 The Read- Centric Algorithm 

In this section, we propose the Read-Centric algorithm, including an overview, its detailed design, 
correctness proof, and complexity analysis. 

5.1 Overview 

Basically, for any consistency condition, the violation cannot be revealed without Read operations. 
For the problem of verifying PRAM consistency, all the Read operations under consideration are on 
the same process (i.e., po). The Read-Centric algorithm examines the Read operations one by one 
in program order and tries to construct legal schedule for each one. If the algorithm gets through 
all the Read operations, it is claimed that the trace satisfies PRAM consistency. Otherwise, the 
algorithm aborts with cycle detection, indicating a violation of PRAM consistency. The basic idea 
is sketched in Algorithm [2] (Lines [6|- [XT] ) . Note that we have ruled out the trivial cases of violations 
of PRAM consistency (Lines |4] - [5]) : a) some Read operation r has no dictating Write; or h) r 
comes before D{r) in program order. 

Algorithm 2: The Read-Centric algorithm (basic idea). 
Data: A trace t. 

Result: True, if t satisfies PRAM consistency; False, otherwise. 



1 Preprocessing: 6 foreach Read operation r in program order do 

mark its previous Read operation r'; 

Call compute_active Writes (r, r') in Alg. [sj 

N > " ^y^^^ ^ ^^^^ schedule(rji) in Alg. |4| 

4 if 3r. (having no D{r) or r <po D{r)) then ^^^^^ ^j^^^ ^^^^^^ 

5 return false; 



2 apply Rule (a) to add "program order" edges; 7 

3 apply Rule (b) to add "write-to order" edges; 8 

9 



11 return true; 



Specifically, for each Read operation r, it is desirable to construct some legal schedule of the 
operations in rjj. (Definition 4.1) so that r can read value from its dictating Write. In the legal 
schedule, some Writes are overwritten and cannot be read again. The algorithm dynamically 
maintains a global data structure to keep record of the possible values for each variable, namely 
Global ActiveWrites (denoted WX)- The computation of W_a {compute-activeWrites in Algorithm 
[3]) is carried out with respect to the newly added operations in (Line [S]) and will be discussed in 
detail in Section [5.2.11 

Upon each Read r, r must read the value from its dictating Write D{r). This observation plays 
an imporant role in the schedule of rjj, (Linejoj). In principle, the schedule subroutine consists of a 
series of applications of Rule (c). It behaves in a similar way to that of Step 3 in the RW-Closure 
algorithm (Lines [7]-[l4] in Algorithm[T|. The major differences between them are twofold. First, the 
schedule subroutine focuses on rjj, for each r, instead of the whole trace. Second, the applications 
of Rule (c) in schedule are carried out in topological order on the rjj,-induced subgraph, resulting 
in a more efficient algorithm. The detailed design of schedule will be discussed in Section [5. 2. 2 and 



its complexity analysis is given in Section 5.5 
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The schedule mentioned above does not necessarily succeed. It may abort with cycle detection 
(Line [To]), indicating the violation of PRAM consistency. A cycle may appear due to the application 
of Rule (c). The key point here is how to dynamically detect a cycle when it appears. This topic 
will be discussed in Section 15.2.31 

In summary, the Read-Centric algorithm has two major advantages over the RW-Closure algo- 
rithm. First, it carries out the applications of Rule (c) in some specific order, thus reducing the 
time complexity to O(n^), in contrast to 0{n^) for the RW-Closure algorithm. Second, it behaves 
in a somewhat incremental fashion. It is able to locate the violation of PRAM consistency at the 
earliest Read. 

5.2 Detailed Design 

In this section, we explore in depth the three essential subroutines in the Read-Centric algorithm, 
namely computc-activeWrites, schedule, and cycle detection. 

5.2.1 How to compute the GlobalActive Writes incrementally 

The data structure GlobalActive Writes keeps record of the possible Writes to read from for each 
variable. 

Definition 5.1 (GlobalActive Writes (Wa)) 

GlobalActiveWrites Wj\^ is a set of key- value mappings, each of which has the form of: 

where stands for the set of Write operations on variable v. 

= {w &W \ var{w) = v] . 



The basic procedures on are summarized in data structure \ Glohal Active Writes The PA- 



goiiihin.^{compute-activeWrites) investigates (not limited to) how to update W_a with respect to 
the newly added operations in r^. 

We expect to identify which Writes should be overwritten by r^. A simple observation shows 
that all the operations in can be divided into two groups: rr-interval and wtu-interval. Ac- 
cordingly, Algorithm [3] examines each group to filter out the Write operations which should be 



overwritten. The rr-interval (Definition 5.2) consists of all the Write operations between r' and r 
(both exclusive). All the operations in r'^ must be scheduled before those in the rr-interval (Lines 
[6]-[7]). In contrast, the rest of operations in r^ are in wtu-interval (Definition 5.3). 

Definition 5.2 (rr-interval (ItVrr)) 

rr-interval for r is the set of operations between r' and r on process po- 

ItVrr = {o\r' ^po o r^ . 

For the first Read r, its rr-interval consists of all the operations before it in program order. 

Definition 5.3 (ww-interval (Itv^w)) 

u)i(;-interval for r is a set of operations: 

ItVww =rs\ Itv 

rr- 

Note that wtu-interval is empty when D{r) is also onpo (either r' ^po D{r) -<po r or D{r) -<po 
r' -<po r). In the other case, vuw-mteival is not empty and all its operations must be scheduled 
after the latest Write on process p[D{r)) in r^ (denoted w^) (Lines 16 - 18 and Line 21). 
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Data Structure GlobalActiveWrites(W^) 



Data Structure EarliestRead(r<^). 



Subroutine replaceWith (war, 

replace active Writes on var with set {w} 

Wa [ var] { w } ; 

Subroutine deactivateFrom(t;ar, o) 
deactivate latest Write of o on var 

Wj[ [var] <r- VF4 [var] \ o.Wy, [var]; 



vvmap: a set of {v E V,w E W) mappings 
W_A. W_A. U vvmap; 

Data Structure Latest Writes(VFs)- 



Subroutine updateLatestWrites (c 

update toOp-Ws depending on op.W^ 

foreach v ^ V do 

if toOp.W^ [v] .wid < op.W^ [v] .wid 
then toOp.Ws [v] op.Wj: [v]; 
if op G W3 A toOp.WY, [v] .wid < op.wid 
then toOp.WY, [var {op)] {op}; 



Subroutine initEarliestRead(o , r) 

set the earliest Read reachable from o to r 

o.r< v.id; 

Subroutine iinH;,1-.pF.;,r-1 i P5.tRp;,H (s^rr. ^^.r.s.\ 

update r<g; of src to the minimum in sues; 
return old r<^ of src 

old src.r<; 

src.r^ min(src.r<^, min o.r^); 

o 6 sues 

return old; 



Subroutine identif yWRPair (w' , old) 

old: the last r^ of w' (w, r) to form w'wr 
triple for Rule (c) 

r the earliest Read such that 
var{r) = var{w') A r.id G [w' .r^,old) 
return D{r); 



Algorithm 3: Computation of the GlobalActive Writes (compute-activeWrites) . 
Data: Operations in rs- 

Result: Update of data structures, including global and W-s,r^ for each operation. 



1 (1) dealing with rr -interval: 



2 Q. 



pre 



r'; 



3 foreach w G ItVrr do 



4 
5 

6 
7 



Call initEarliestRead(u;, r); 
Call updateLatestWrites(opre; 

// deactivate some Writes 
Call replace With('uar(tf;), 



w); 



-'pre 



w; 



9 Call updateLatestWrites(oj^e)''); 



10 (2) dealing with WW -interval: 

11 Opre ^ W^; 

12 Let Wa initial copy of W^; 

13 foreach w G ItVy,w do 
Call initEarliestRead(i(;, r); 



14 
15 

16 
17 
18 

19 



'pre ■I'^j'i 



Call updateLatestWrites(a 

// deactivate some Writes 
Call deactivateFrom(mr(w), 
Wa[v]^{w}; 



20 Call updateLatestWrites(opre) 

21 Call unionWith(VFA); 
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5.2.2 How to schedule operations in r-downset incrementally 

Upon each Read r, r must read the value from its dictating Write D{r). As a consequence, any 
other active Writes on variable v = var{r) than D{r) (i.e., VF^fu] \ D{r)) must be scheduled 
before D{r) due to Rule (c). The dynamic adding of edges by Rule (c) may change the reachability 
relation among operations. Therefore, Rule (c) may apply again. The iteration repeats or aborts 
once a cycle is detected. In the following, we identify and discuss three major issues in schedule 
(Algorithm [4]) , all related to Rule (c). 

1) How to identify the triple w'wr involved in Rule (c). Owing to the uniqueness of Writes, we 
can focus on the reachability relation between pairs of w' and r. Moreover, all the Reads are on 
the same process. It is sufficient for any Write to keep record of the earliest Read reachable from 
itself. 

Definition 5.4 (EarliestRead (r^)) 

Assume that each Read is associated with its index (denoted id) on process pq and can be access 
by R[idL\. The EarliestRead of an operation o, o.r^, is the index of the earliest Read it can reach: 

o.r^ = i ^ R[i] A ti'<i{o ^ R [i'])- 



The procedures on are summarized in data structure EarliestRead^ including initialization 



(called in Line [4] and Line [14] in Algorithm [s]) and update. The r<^ (denoted old) of some Write 
on variable v (denoted w') may be updated (to new) due to the applications of Rule (c). In other 
words, w' can now reach the Reads in R[new . . . old), which is the set of Reads between R[new] and 
R[old — 1] (formally, R[new . . . old) = {r £ R \ R[new] ■<po r <po R[old] }). If there are Reads on 
variable v in R[new . . . old), new triples for Rule (c) involving w' can be identified. When there 
are more than one such Reads, the subroutine identify WRPair (in data structure EarliestRead^ 



always takes the earliest one. The correctness of this treatment is given in Lemma 5.2 (Section 



5.4). Moreover, the treatment also plays an imporant role in the complexity analysis in Section |5.5[ 
2) How to decide the w' parts of Rule (c). Upon each Read r, the series of applications of Rule 
(c) in schedule seems to affect all the operations in r^. However, we can decide in advance the 
Writes which may act as the w' parts of Rule (c). The key observation is that the update of r<^ of 
an operation only depends on its direct successors (subroutine updateEarliestRead in data structure 
EarliestRead^ . As we have mentioned, the schedule start with adding edges from the active Writes 



in 1/F4[f ar(r)] \ D{r) to D{r) (Lines [T] - [4] in Algorithm |4]) . Therefore, the operations which may 
act as r' of Rule (c) are restricted in the downset of D{r). We summarize the above arguments in 
Lemma 15.11 



Lemma 5.1 {w' parts of Rule (c) in schedule) 

Upon each Read r, the Writes which may act as w' parts of Rule (c) are restricted in D{r)i^. □ 



Based on Lemma 5.1, the schedule can now focus on the operations in D{r)i)^ (Lines [6| - [Toj in 
Algorithm [4]) . 

3) How to manage the order of the applications of Rule (c). There are two cases according 
to whether D{r) is in r'^. If D{r) ^ r'^^, no r<^ of the operations in D{r)\^ are updated after 
the applications of Rule (c) due to the fact that r reads the value from D{r) (Lines [l] - |4] in 
Algorithm [4]) . Therefore, no more applications of Rule (c) occur (Linejs]). In the following, we 
assume that D{r) G and show how to manage the order of the applications of Rule (c). Its 
correctness proof and time complexity analysis are given in Lemma [5.3| and Lemma [5 . 4| respectively. 
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Algorithm 4: Schedule. 



Result: Schedule operations in rjj,; return true once cycle appears; false, otherwise. 



1 (1) start schedule with r reading from D{r) 

2 foreach w' E VF4 [var{r)] \ { D{r) } do 
Call 



cycle 

if cycle then return true; 



applyRuleC 'w', D{r)); 



5 if D{r) ^ then return false; 

6 (2) decide the w' parts of Rule (c) in schedule 

7 Let Q ^ operation graph corresponding to rjj,; 

8 Let <— transpose of Q; 

9 Let -D(r)^ downset of D{r); 
10 Let induced subgraph of Q'^; 
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(3) prepare for further schedule 



12 traverse compute for each o G -D(r)jj, 

13 (a) COUNT: number of direct predecessors 

14 (b) PRELIST: list of direct predecessors 

15 (c) SUCLIST: list of direct successors; 

16 (4) schedule, based on topological sorting 

17 queue containing possible w' part of Rule (c) 

18 Let QZERO ^ empty queue; 

19 enqueue (QZERO, D{r)); 



20 
21 
22 

23 
24 
25 

26 
27 

28 
29 
30 

31 

32 
33 
34 
35 
36 
37 

38 



// Queue-based topological sorting 
while QZERO is not empty do 
w' ^ dequeue(QZERO); 
\i w' ^ w'.DON E = false then 
[cycle, w) 



Call 



checkRuleC 



w 



if cycle then return true; 

// w is UNDONE as w' part of Rule (c) 
if u; / null Aw € D{r)i^ A {\w.DONE) 
then 

insert w' into tu. SUCLIST; 
insert w into w'. PRELIST; 
_ -(w'.COUNT ^ -uj'.COUNT + 1; 

/ / erase dependency and identify potential 
w' (s) of Rule (c) 
if w'. COUNT = then 
lij'.DONE ^ true; 
foreach s G w'. SUCLIST do 
s.COUNT ^ s.COUNT - 1; 
if s.COUNT = then 
enqueue(QZERO, s); 



return false; 



The basic idea is that we examine each operation in D{r)jj, acting as w' part of Rule (c) in 
topological order in D(r)jj,-induced subgraph (Line 10). Specifically, we choose the Queue-based 



topological sorting algorithm [j^ because it can be easily adapted to handle the dynamically added 
edges due to Rule (c). In general, the Queue-based topological sorting, along with the applications 
of Rule (c), works as follows: it first puts the single source, D{r), into the QZERO queue (Line 
19). Then it repeatedly chooses a Write w' of in-degree (Lines 22 - 23), checks Rule (c) involving 
w' (Lines 24 - 25), and erases the dependency relation toward its direct successors (Lines 31 - 37). 



However, if the w' part and the w part of the added edge are both in D{r)ij, and w has not 
been marked DONE yet, it is necessary to examine w first before marking w' as DONE. This 
is implemented in the topological sorting framework by imposing dependency from w' to w and 



increasing the in-degree of w' (Lines 26 - 30 ) 
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Procedure checkRuleC(^i;') 



Data: w' : Write acting as the w' 
part of Rule (c) 

Result: Check whether w' really acts 
as the w' part of Rule (c) and 
apply Rule (c) if it is true; 
return true if some cycle 
appears; otherwise return the 
w part involved in Rule (c) 

1 old ^ Call 

updateEarhestRead(i(;' , ■w' .PRELIST) ; 

2 w ■(r- Call identifyWRPair(w', o/d); 

s ii w is not null then 

4 cycle ^ Call apply RuleC (ti;', ty); 

5 if cycle then return true; 

6 else return w; 



Procedure applyRuleC(w', if) 
Data: w',w: the w' and w parts of 
Rule (c) 

Result: apply Rule (c); return true if 
some cycle appears 

1 add edge from w' to w; 

2 // cycle detection 

3 V -^r- var{w); 

4 if w' .Ws [v] .wid > w.wid then 

5 ^ return true; 

6 // update related data structures 

7 Call updateEarliestRead(t(;', { w }); 

8 foreach o£{o\w^o^cr} do 

9 Call deactivateFrom(t;ar(o), If); 

10 Call updateLatest Writes (it;', o); 

11 return false; 



5.2.3 How to determine the violation of PRAM consistency incrementally 

The schedule subroutine does not necessarily succeed. It may abort with cycle dete ction (Line [4| 



and Line 25 in Algorithm |4]) . Cycle detection is an important step in the procedure applyRuleC 
immediately following the identification of the triple of Rule (c) (Line |4] in procedure checkRuleC). 
In the following, we describe the principle of cycle detection and the related data structure. 

Assume that Rule (c) is applied and an edge from the w' part to the w part is added. To close 
a cycle, we need to know the reachability relation from the w part to the w' part. Note that the 
w part of Rule (c) must be some Write having dictated Reads (for simplicity, we denote the set of 
such Writes by W3 = {wGW\3r£ R.w -<WR f})- Therefore, we can focus on the reachability 
relation between two operations in which the w part is in W3. Furthermore, we can associate 
each Write in with an identifier (denoted wid) according to its dictated Reads. Formally, 
w.wid = mm{r.id \ w -<WR f)- The wid notion captures the basic idea that the Reads on the same 
variable can introduce order among their dictating Writes. In other words, for each variable f , 
the order of Writes on v in is determined by their earliest dictated Reads. Therefore, for each 
operation, we can further focus on the latest Write for each variable preceding it. 

Definition 5.5 (Latest Writes (Ws)) 

For each operation o, its LatestWrites is a set of key- value mappings: 

f : V £ V ^ Wa £ W3, 

with the following conditions: 

1. var(wa) = v; 

2. Wa -< o; 

3. for any other w'^ satisfying 1) and 2), we have w'^.wid < w^.wid. 
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Po Wyl Rfl ' > Rz3 » Rcl > Rzl —3 Ryl » Ral >"^P >^5? 




Figure 3: Illustration of schedule and cycle detection. 



Note that the update of Ws of an operation only depends on its predecessor (data structure 
LatestWrites) . Based on the data structure, the cycle detection amounts to checking the reachability 



relation from the w part to the w' part by comparison between two wids (Lines [2] - [5] in procedure 
applyRuleC^ . 

5.3 An Illustrating Example 

Figure |3] shows a running example of the Read-Centric algorithm, mainly concerning about the 
schedule and cycle detection subroutines. Note that the trace in Figure |3] is similar to that in 
Figure [T] used in Section 4.3 

Assume that Rx2 is now under consideration (i.e., r = Rx2 in Line [6] in Algorithm [2]) . Note 
that the edge from Wz3 to Wzl (with label 1.1) has already been added due to Rzl. The schedule 
subroutine starts with adding the edge from Wx3 to Wx2 (with label 2.1) and the edge from Wx5 
to Wx2 (with label 2.2) due to Rule (c) (Lines [l] - 14] in Algorithm |4]). The rectangle area covers 



all the potential w' parts of Rule (c) (i.e., Wx2i^ in Lines 



a-H- 

tmg frai 



It is the case of D{r) G r'.. (i.e 
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38). 



Wx2 G i?61jj,), so furture schedule with the topological sorting framework is needed (Lines 

In the running example, Wz2 is chosen to be examined for the w' part of Rule (c) before Wy2 
and Wzl, and the edge from Wz2 to Wzl (with label 2.3) is added (in procedure \checkRuleC ) . 
Since Wzl is UNDONE, we have to examine Wzl first before marking Wz2 DONE, by increasing 
the COUNT of Wz2 (Lines 26 - 30). According to the topological sorting, Wy2 is examined and 



the edge from Wy2 to Wyl (with label 2.4) is added. 

Then, it is H^zl's turn. Since there is a path from Wzl to Rz?, via the edge with label 2.4, 
Rule (c) can be applied and the edge from W zl to Wz2> (with label 2.5) is added, closing a cycle 
together with the edge from Wz'i to Wzl (with lable 1.1). 

5.4 Correctness Proof 



In this section, we prove the correctness of the Read-Centric algorithm, as stated in Theorem 5.1 



To this end, we first justify the correctness of the identifyWRPair subroutine (Lemma 5.2) and the 



schedule subroutine (Lemma 5.3). 



Lemma 5.2 (Correctness of identifyWRPair) 

It is correct for the subroutine identifyWRPair (in data structure 
earliest Read with respect to the w'wr order (in Rule (c)), in the sense that: 



EarliestRead ) to focus on the 



the data structures GlobalActiveWrites| |EarliestRead , and LatestWrites are updated cor- 
rectly, and 
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a) correctness of 
identifyWRPair (Lemma 5.2) 




b) "Single Edge" property 
of schedule (Lemma 5.4) 




Figure 4: The correctness and time complexity of the Read-Centric algorithm. 



the cycle detection in procedure [applyRuleC will not miss any cycle. □ 



Proof The correctness proof is illustrated in Figure |4^, in which all operations perform on the 
same variable. Suppose that there exists a path from W' to R (with label 2). By Rule (c), the 
edge from W to W (with label 3) and the edge from W to W" (with label 4) should be added. 
Nevertheless, the latter one is not necessary, as indicated in identifyWRPair: 

• The program order between R and R" implies the edge from W to W" (with label 1). Thus, 
the missing of the edge from W' to W" does not break the reachability relation from W' to 
W" , resulting in no effect on the update of the data structures. 

• Any cycle involving the (missing) edge from W' to W" will be closed with the edge from W' 
to W and the edge from W to W" . ■ 

The correctness of schedule follows from the above lemma. 



Lemma 5.3 (Correctness of schedule (Algorithm [4]) 

Let r' and r he two consecutive Reads in program order, and suppose that r is under consideration 
in Algorithm^ (Line^. The schedule in Algorithm^ is correct with respect to program order, 
write-to order, and w'wr order, in the sense that schedule examines exactly all the three orders 
considered in the RW-Closure algorithm (Algorithm^. □ 

Proof It is trivially valid for program order and write-to order in that they are both static. 

The set of the w'wr triples examined in schedule is a subset of those considered in the RW- 
Closure algorithm. In principle, the schedule can examine all the possible w'wr triples of Rule (c) 
based on the topological order described in schedule. Though some w'wr triples are ignored in 



identifyWRPair, the correctness of schedule is still kept by Lemma 5.2 



The correctness of the Read-Centric algorithm can be proved by induction on the Read operations. 
Theorem 5.1 (Correctness of the Read-Centric Algorithm) 

The instance of VPC-MU problem satisfies PRAM consistency if and only if the Read-Centric 
algorithm (Algorithm^ terminates with a DAG. □ 



Proof By induction on the Reads in program order, together with Lemma 5.3 as the induction 
step, we can prove that the Read-Centric algorithm is equivalent to the RW-Closure algorithm 
(Algorithm [T]) with respect to program order, write-to order, and w'wr order. This completes the 



correctness proof by Theorem 4.1 
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Data structure 


Subroutine 


Worst-case 
time complexity 


GlobalActive Writes (W^ ) 


replaceWith 


0(1) 


deactivateFrom 


0(n) 


unionWith 


O(n^) 


EarliestRead(r^ ) 


initEarliestRead 


0(1) 


updateEarliestRead 


0{n) 


identify WRPair 


0{n) 


LatestWrites(Ws) 


updateLatest Writes 


0{\V\) = 0{n) 



Table 2: The worst-case time complexity of subroutines in data structures. 



5.5 Time and Space Complexity 

The worst-case time complexity of the Read-Centric algorithm is dominated by the cost of schedule, 
which relies on Lemma 5.4 The cost for manipulating data structures is summarized in Table [2} 

Lemma 5.4 ("Single Edge" property of schedule i Llgorithm |4])) 

Upon each Read r, the subroutine schedule (Algorithm 4) adds at most one edge going out from 
each Write in L'(r)jj,. □ 

Proof There are two possible cases for schedule to add the second edge going out from some Write 
w' in D{r)i^. The first case is concerned with the subroutine identify WRPair and is dealt with in 
Lemma |5.2[ The other case is concerned with the topological sorting framework. Specifically, an 
edge from w' to another Write w is added. The Write w is in -D(r)jj, and has not been marked 



DONE yet (Line 26 in Algorithm |4]) . In this case, we prove by contradiction that no edges going 
out from w' can be added when w' is examined again (after w is marked DONE). Otherwise, we 
show that some cycle appears before w' is examined again. Figure gives an illustration and all 
the operations perform on the same variable v. 

Let W'WR be the triple comforing with Rule (c). The edge from W' to W (with label 2.1) 
is added, and W is in D(r)jj, and has not been marked DONE yet. Assume that the second edge 
going out from w' will be added when w' is examined again. This assumption implies that the 
of w' is updated via W and a new Read R" (on variable v) is now reachable. Thus, when W is 
examined, R" is reachable from W (path with label 3), and the edge from W to W" (with label 
2.2) is added, closing a cycle with the edge from W" to W (with label 1). The subroutine schedule 
aborts. Contradiction. ■ 



The overall worst-case time complexity is given in Theorem 5.2 
Theorem 5.2 (Time Complexity of the Read-Centric Algorithm) 

The worst-case time complexity of the Read-Centric algorithm (Algorithm 2) is in 0{n'^), where n 
is the total number of operations. □ 

Proof Suppose that Read r is under consideration and there are at worst n operations in rjj,. 

The worst-case time complexity of the subroutine computc-activeWrites is in 0{n^) (Linejsj). 
The worst-case time complexity of the subroutine schedule (Line [9]) consists of four parts in Algo- 
rithm m 

1. 0{n^) for startup (Lines [l] - [4| . 

2. 0{n + m) = 0{n'^) to decide the w' parts of Rule (c) (Lines [g] - [Tol), where m is the total 
number of edges. 
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3. 0{n + m) = 0{n'^) for preparation (Lines 11 - 15). 
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38), where the first term is 



4. 0{n + m) + 0{n • (n + m + n^)) = 0{n'^) to schedule (Lines 

for the topological sorting framework and the latter one is for the applications of Rule (c) in 
procedure \applyRuleC\ for at most n times by Lemma 5.4 



In summary, Algorithm |4] cost O(n^) in the worst case. 

Therefore, the worst-case time complexity of the Read-Centric algorithm is in O(n^) 
most n iterations. 



for at 



Clearly, the space complexity of the Read-Centric algorithm is in 0(1 • n^) + 0{n ■ 1) + 0{n ■ n) = 
0{Tn?), consisting of the space for the global for each operation, and Wy, for each operation. 

6 Performance Evaluation 



The worst-case time complexity of the two algorithms given in Section [4. 5| and Section 5.5 are both 
asymptotic. In this section, we further evaluate their performance by experiments. 

6.1 Experiment Setup 



— RW-Closure Algorithm 

— Read-Centric Algorithm 



— RW-Closjre Algorithm 

— Read-Centric Algorithm 
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(a) Time cost of two verification algorithms 
over random traces. 



(b) Time cost of two verification algorithms 
over valid traces. 



Figure 5: Performance evaluation of two verification algorithms. 



We carry out two groups of experiments. The first group of experiments are running on the 
random generated traces and the other group of experiments on the valid traces. Valid traces are 
known in advance to be PRAM consistent. Note that both algorithms running on valid traces have 
to access all the operations. For each group of experiments, both the number of processes and 
the number of operations are tunable to investigate their impacts on the time cost (in ms). The 
number of operations varies from 500 to 30000, while the number of processes varies from 2 to 
20. In addition, we continue to increase the number of operations for the Read-Centric algorithm 
to justify its scalability when it costs too much for the RW-Closure algorithm in time. All the 
experiments are conducted on a machine with 3.40GIIz CPU and 1280M RAM (virtual machine) 
running Win7 (64-bit). 
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— Read-Centric Algorithm 




Figure 6: Time cost of the Read-Centric algorithm over valid traces. 



6.2 Experimental Result 

As shown in Figure [5| the Read-Centric algorithm outperforms the RW-Closure algorithm in time 
cost both over random traces and over valid traces. The more the operations are, the more signifi- 
cant the advantages become. Specifically, when both running on the valid traces, the Read-Centric 
algorithm performs more efficiently than the RW-Closure algorithm with at most 694X speedup 
over the valid trace with 20 processes and 8000 operations. In addition, we increase the number 
of operations to investigate the scalability of the Read-Centric algorithm. As shown in Figure |6j 
the Read-Centric algorithm can efficiently deal with the valid executions with 20 processes and up 
to 60000 operations (in less than 600s). In contrast, it takes the RW-Closure algorithm more than 
3000s over the valid trace with 20 processes and 8000 operations. In summary, the above results 
justifies the scalability and thus practical importance of the Read-Centric algorithm. 

7 Conclusion and Future Work 

In this work, we have studied the problem of verifying PRAM consistency over Read/Write traces 
with Mutiple variables and Unique Write operations (VPC-MU) and proposed two polynomial 
algorithms for it. The RW-Closure algorithm works with the worst-case time complexity O(n^), 
where n is the total number of operations in the Read/Write trace. The Read-Centric algorithm 
reduces its time complexity to 0{n'^). 

In Section [sj we identify four variants of the problem of verifying PRAM consistency (VPC). 
We hope to address the complexity issues of other variants in future work, including the VPC- 
MD problem and the VPC-SD problem listed in Table [T] The verification problems with respect to 
other weak consistency conditions also deserve consideration, e.g., causal consistency f2]. Moreover, 
Golab et al. |15j introduce the useful notion of quantifying the severity of the violations. For weak 
consistency models, it is also desiable to define appropriate quantities for the severity of violations 
and design efficient algorithms to evaluate them. 
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